Motivation

The initial aim of this project was to examine the amount of COVID-19 cases in each county in Florida. I decided to create this project for one underlying purpose; to use the rayshader package in R and create 3D models of my ggplot graphs. I also wanted to get more experience with geospatial analysis and decided that focusing on COVID-19 data and the state that I live in would provide great practice for experimenting with the sf, maps, and rnaturalearth packages.

However, after completing the primary objectives of this project, I wanted to do go beyond visualizations and perform some analysis on my data with other external datasets. Researching the ongoing pandemic, I collected various articles that stated COVID-19 as a discriminatory virus and decided that I would add upon this project by creating regression models of 3 categories identified by the CDC to determine how various factors explain the amount of COVID-19 cases and can be viewed as discriminatory. I will also attempt to answer which of these categories (explained below) contribute the most to the number of COVID-19 cases in the state of Florida. The independent demographic variables are gathered from the Census Bureau and due to the scope of my project, these variabels will only be in correlation with the state of Florida and the counties within it.

The main article that I will be analysing in this project is published by the Center for Disease Control and Prevention. This article introduces information about the disproportion of those with COVID-19 and how minority groups carry a heavier count. An excerpt from the article states,

“A recent CDC MMWR report included race and ethnicity data from 580 patients hospitalized with lab-confirmed COVID-19 found that 45% of individuals for whom race or ethnicity data was available were white, compared to 59% of individuals in the surrounding community. However, 33% of hospitalized patients were black, compared to 18% in the community, and 8% were Hispanic, compared to 14% in the community. These data suggest an overrepresentation of blacks among hospitalized patients. Among COVID-19 deaths for which race and ethnicity data were available, New York City identified death rates among black/African American persons (92.3 deaths per 100,000 population) and Hispanic/Latino persons (74.3) that were substantially higher than that of white (45.2) or Asian (34.5) persons.”

This excerpt exemplifies the disproportion between minority groups and whites in the amount of those hospitalized from COVID-19.

The article goes further to provide some factors that may explain the disproportion (which I will be analysing and how it pertains exclusively to the state of Florida). The three main categories that these factors fall in include “Living Conditions,” “Work Circumstances,” and “Underlying Health Conditions and Lower Access to Care.” The article concludes by providing some solutions to the needs of vunerable populations during the pandemic.


Data Manipulationa and Data Cleaning

Although most of the data manipulation/cleaning was performed in R, minor edits were made utilizing PostgreSQL and Excel before importing the datasets into the global enviornment.

data<-read_csv("C:/Users/alvin/Desktop/EconometricsLAB/us-counties.csv")


florida<-data %>% filter(state=="Florida") %>% filter(date=="2020-06-05") %>% arrange(desc(cases)) 


#current # of cases as of date in florida

florida <- florida[-c(50), ]
#removes the Unknown column and matches the amount of counties in the "counties" table


topfive<-data %>% filter(state=="Florida") %>% filter(county==c("Miami-Dade","Broward","Palm Beach","Hillsborough","Orange"))

census<-read_csv("C:/Users/alvin/Desktop/EconometricsLAB/Quickfacts2.csv")

#census <- census %>%   # Switched columns with rows to be able to perform inner_join with flcases dataset
  #gather("county", "value", 2:ncol(census)) %>%
  #spread(Fact, value)
#world <- ne_countries(scale = "medium", returnclass = "sf")

counties <- st_as_sf(map("county", plot = FALSE, fill = TRUE))
counties <- subset(counties, grepl("florida", counties$ID))
counties$area <- as.numeric(st_area(counties))

names(counties)[1] <- "county" 
counties$county<- as.character(counties$county) #inner_join on character type

#stopwords = readLines('C:/Users/alvin/Desktop/EconometricsLAB/stopwords.txt')     # stop words file
#typeof(counties$county)
#counties$county<- removeWords(counties$county,stopwords)     #Remove stopwords that were preventing me from using a join function
counties$county<-substring(counties$county,9,100)

counties$county<-toupper(counties$county)
florida$county<-toupper(florida$county)
census$Geographic_Area<-(toupper(census$Geographic_Area))
names(census)[1] <- "county" 

# regular inner join does not work due to case sensitivity, instead use the fuzzyjoin package
flcases<-inner_join(counties,florida, by= c("county","county")) 

#later learned you can avoid using fuzzyjoin package if you use the toupper() function on the counties dataset to make the letters of each word uppercase, after using the substr() function to keep the name of the counties and ommiting the "florida," section without using stopwords and readLines() function.


flcovidcensus<-left_join(census,florida,  by=c("county","county"))

Cases in Florida Counties

List of Most Cases in Florida by County
MIAMI-DADE 19055
BROWARD 7572
PALM BEACH 6857
HILLSBOROUGH 2554
ORANGE 2209
LEE 2199
COLLIER 1874
DUVAL 1736
PINELLAS 1478
MANATEE 1162
POLK 1127
ESCAMBIA 871
MARTIN 810
VOLUSIA 791
OSCEOLA 732
SARASOTA 658
ST. LUCIE 561
HENDRY 521
SEMINOLE 515
CHARLOTTE 481
BREVARD 439
LEON 433
PASCO 416
CLAY 393
ALACHUA 392
LAKE 317
JACKSON 298
GADSDEN 288
MARION 272
ST. JOHNS 270
HAMILTON 261
SUMTER 261
SANTA ROSA 256
OKALOOSA 249
DESOTO 245
LIBERTY 218
FLAGLER 195
SUWANNEE 174
PUTNAM 172
COLUMBIA 167
INDIAN RIVER 154
HIGHLANDS 141
HARDEE 131
CITRUS 125
WALTON 124
HERNANDO 122
OKEECHOBEE 122
BAY 115
MONROE 110
NASSAU 82
GLADES 81
WASHINGTON 79
MADISON 72
CALHOUN 64
DIXIE 58
LEVY 55
BRADFORD 52
WAKULLA 35
UNION 34
HOLMES 30
JEFFERSON 30
BAKER 29
TAYLOR 27
GILCHRIST 19
LAFAYETTE 10
GULF 3
FRANKLIN 2
Based on the visual above, there seems to be some sort of correlation between the number of COVID-19 cases and the total population for each county in Florida. Logically, the higher the population , the higher the chance of interactions and contamination, thus higher cases of COVID-19

Based on the visual above, there seems to be some sort of correlation between the number of COVID-19 cases and the total population for each county in Florida. Logically, the higher the population , the higher the chance of interactions and contamination, thus higher cases of COVID-19

\(~\) \(~\)


Visualizing COVID-19 Cases for Florida Counties in 3D

Given the visual 3D geodata model and the top-down view of the 2D graph, the counties with the highest COVID-19 cases tend to be focused at the most Southeastern part of Florida. If there is indeed a correlation between COVID-19 cases and minority groups, This focus can be attributed to the migration of minorities from many diverse ethnic countries that lie to the southeast of Florida including, Cuba, Jamaica, Haiti, Puerto Rico, etc...

Given the visual 3D geodata model and the top-down view of the 2D graph, the counties with the highest COVID-19 cases tend to be focused at the most Southeastern part of Florida. If there is indeed a correlation between COVID-19 cases and minority groups, This focus can be attributed to the migration of minorities from many diverse ethnic countries that lie to the southeast of Florida including, Cuba, Jamaica, Haiti, Puerto Rico, etc…

\(~\)


Top 5 Counties

These 5 counties in Florida have the most cases as to date, but they are not all that similar to one another. There are two interesting takeaways from the graph above:

  1. Each of these counties have varying progression of cases over time, whereas Hillsborough had a gradual increase over time, Miami-Dade had an exponential increase in cases around April. What led to the exponential increase in cases that is not found in other counties?

  2. Although Palm Beach, Orange, and Hillsborough share similar population size,there an evident higher case count in Palm Beach compared to the other two counties. This may inform us that there is indeed other factors than total population that may lead to being diagnosed with COVID-19.


Correlation Between COVID-19 & Racial/Ethnic Minority Groups

Before I begin testing diffrent factors that can affect the risk of being infected with COVID-19, I want to determine if there is even a correlation between COVID-19 and racial/ethnic minority groups or if my earlier 3D model can be explained by the difference in population for each county. To do this, I collected Census Bureau demographics data for all counties in Florida and instead of collecting the data for minority groups (which would result in many different columns representing the vast variety of minority groups), I collected the data for those who identify as both white and only one race. This excludes people who may be white but are 2 or more races.

At first it seems that our data contradicts the Center for Disease Control and Prevention’s (CDC) article and that it is in fact the case that there may be a positive correlation between COVID-19 cases and white floridians. However, after creating a similar empirical graph and using the percentage of white floridians, it seems that there is indeed a negative correlation between COVID-19 and white floridians. The first graph was misleading due to higher populations logically having higher number of cases (As we saw in the circular barplot). If we are to examine how COVID-19 affects racial/ethnic minority groups, then we must examine the population as a percentage to fully understand the impact COVID-19 has on diffrent race groups.

Since there is indeed a positive correlation between minority groups and COVID-19, we will examine various quantitative factors that the CDC has outlined that may be the reason for the correlation.


Graphs and Regression Modeling for Each Factor Category

Health differences between racial and ethnic groups are often due to economic and social conditions that are more common among some racial and ethnic minorities than whites. In public health emergencies, these conditions can also isolate people from the resources they need to prepare for and respond to outbreaks

Living Conditions

Multi-generational households, which may be more common among some racial and ethnic minority families, may find it difficult to take precautions to protect older family members or isolate those who are sick, if space in the household is limited.

Racial and ethnic minority groups are over-represented in jails, prisons, and detention centers, which have specific risks due to congregate living, shared food service, and more.

\[ Number_of_Cases = \beta_0+\beta_1*Multigenerational_Households +\beta_2*Correctional_Facilities+u\]

## 
## Call:
## lm(formula = cases ~ Households_sixty + Correctional_facilities, 
##     data = flcovidcensus)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4566.4  -717.4   -66.5   730.0 10541.2 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               75.2984  1324.3099   0.057    0.955    
## Households_sixty         -22.3434    25.5094  -0.876    0.384    
## Correctional_facilities    0.7716     0.1117   6.906 2.72e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1972 on 64 degrees of freedom
## Multiple R-squared:  0.4405, Adjusted R-squared:  0.4231 
## F-statistic:  25.2 on 2 and 64 DF,  p-value: 8.48e-09

Work Circumstances

Critical workers: The risk of infection may be greater for workers in essential industries who continue to work outside the home despite outbreaks in their communities, including some people who may need to continue working in these jobs because of their economic circumstances. Nearly a quarter of employed Hispanic and black or African American workers are employed in service industry jobs compared to 16% of non-Hispanic whites. Hispanic workers account for 17% of total employment but constitute 53% of agricultural workers; black or African Americans make up 12% of all employed workers but account for 30% of licensed practical and licensed vocational nurses.

Since the Census Bureau gives information about both firms and establishment, I decided to go with establishments in this analysis because of how the Census Bureau defines an establishment.

\[ Number_of_Cases = \beta_0+\beta_1*Essential_Industries+u\]

## 
## Call:
## lm(formula = cases ~ establishment, data = flcovidcensus)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3024.2  -184.8   211.7   390.8  6658.6 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -385.53814  172.00109  -2.241   0.0286 *  
## establishment    0.75633    0.04805  15.741   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1195 on 62 degrees of freedom
##   (3 observations deleted due to missingness)
## Multiple R-squared:  0.7999, Adjusted R-squared:  0.7966 
## F-statistic: 247.8 on 1 and 62 DF,  p-value: < 2.2e-16

Underlying Health Conditions and Lower Access to Care

Not having health insurance: Compared to whites, Hispanics are almost three times as likely to be uninsured, and African Americans are almost twice as likely to be uninsured. In all age groups, blacks are more likely than whites to report not being able to see a doctor in the past year because of cost. Inadequate access is also driven by a long-standing distrust of the health care system, language barriers, and financial implications associated with missing work to receive care.

\[ Number_of_Cases = \beta_0+\beta_1*Uninsured+u\]

## 
## Call:
## lm(formula = cases ~ Uninsured, data = flcovidcensus)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3185.3  -463.0   308.3   537.2  4105.4 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -8.141e+02  2.287e+02   -3.56 0.000995 ***
## Uninsured    3.477e-02  2.134e-03   16.29  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1170 on 39 degrees of freedom
##   (26 observations deleted due to missingness)
## Multiple R-squared:  0.8719, Adjusted R-squared:  0.8686 
## F-statistic: 265.5 on 1 and 39 DF,  p-value: < 2.2e-16

All Categories Combined

\[Number_of_Cases = \beta_0+\beta_1*Multigenerational_Households + \beta_2*Correctional_Facilities+ \beta_3*Essential_Industries+\beta_4*Uninsured+u \]

## 
## Call:
## lm(formula = cases ~ Households_sixty + Correctional_facilities + 
##     establishment + Uninsured, data = flcovidcensus)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2754.9  -641.6   102.5   669.8  2926.4 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             -2.404e+03  8.700e+02  -2.763  0.00907 ** 
## Households_sixty         3.088e+01  1.646e+01   1.876  0.06900 .  
## Correctional_facilities  1.699e-02  1.066e-01   0.159  0.87425    
## establishment           -5.341e-01  2.828e-01  -1.889  0.06726 .  
## Uninsured                5.752e-02  1.207e-02   4.765 3.25e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1112 on 35 degrees of freedom
##   (27 observations deleted due to missingness)
## Multiple R-squared:  0.8958, Adjusted R-squared:  0.8839 
## F-statistic: 75.22 on 4 and 35 DF,  p-value: < 2.2e-16

Conclusion

Before I analyzed the factors that cause the disproportion of COVID-19 cases between whites and minority groups, I analzed the correlation between populations and COVID-19 to quell any doubt the correlation could be found in the population size and that many minority groups tend to live in populated cities and counties. However, after my analysis, I can undeniably conclude that COVID-19 does discriminate.

Overall, with all the independent variables analyzed, about 88% of COVID-19 cases can be explained by the number of multigenerational households, correctional facilities, essential industries, and amount of people without health insurance. The CDC has succesfully outline correlated variables that increase cases of COVID-19 and have a vital impact in vunerable racial/ethnic minority goups in Florida. Going by the different categories, “Underlying Health Conditions and Lower Access to Care” showed the greatest positive correlation with COVID-19 cases.

Some imprvements that can implemented in the future to improve the accuracy of my analysis would include swapping outdated/2020 estimate data from the Census Bureau and instead incorporate the actual 2020 data. I am currrently unable to do so seeing as that particular data is currently being collected.

An important note concerning the findings is that although I have independent variables from each factor category, there are still many other factors that either could not be quantified or could not be recorded in the Census data. This includes data for residential segregation which would fall under the Living Conditions category. Other factors that I could not collect for the analysis include: paid sick leave, underlying medical conditions, stigma and systemic inequalities, and population density for each county. If more info does become available, then I will update the markdown file in its github respository.

Concerning any improvements code-wise, I will potentially update the html file with an updated circular barplot that is one stacked plot instead of two distinct graphs to better view the proportion between number of cases and county population.